Skip to content

Add isnotempty PPL function#5193

Closed
lucacavenaghi97 wants to merge 1 commit into
opensearch-project:mainfrom
lucacavenaghi97:add-isnotempty-function
Closed

Add isnotempty PPL function#5193
lucacavenaghi97 wants to merge 1 commit into
opensearch-project:mainfrom
lucacavenaghi97:add-isnotempty-function

Conversation

@lucacavenaghi97

Copy link
Copy Markdown

Description

Adds isnotempty(field) to PPL. Returns true when a field is not null and not empty string — logical negation of isempty(field).

Resolves #5182

Changes

  • Added ISNOTEMPTY token and parser rules in ppl, language-grammar, and async-query-core grammars
  • Added IS_NOT_EMPTY enum constant to BuiltinFunctionName
  • Registered Calcite implementation as NOT(IS_NULL(arg) OR IS_EMPTY(arg)) in PPLFuncImpTable
  • Added testIsNotEmpty integration test
  • Added docs

@github-actions

github-actions Bot commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 5eed3f2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Grammar Inconsistency

isNotEmptyExpression is added as a separate booleanExpression alternative, but isEmptyExpression groups both ISEMPTY and ISBLANK together. The ISNOTEMPTY token is also added to textFunctionName, which may allow it to be used in non-boolean contexts inconsistently. Consider whether isNotEmptyExpression should be merged into isEmptyExpression (e.g., (ISEMPTY | ISBLANK | ISNOTEMPTY)) for consistency, or if the separation is intentional.

isNotEmptyExpression
  : ISNOTEMPTY LT_PRTHS functionArg RT_PRTHS
  ;
Misleading Doc Example

In the documentation example, the last row shows isnotempty(temp) as True with temp value being empty string ' ' (spaces). However, isnotempty should return False for whitespace-only strings only if it behaves like isblank. The current implementation is NOT(IS_NULL OR IS_EMPTY), so whitespace strings would return True. The table entry shows an empty cell for temp which may confuse readers — it should clearly show the whitespace value or be clarified in the description.

| True             | Pyrami  | True                 | Pyrami   |
| True             | Netagy  | True                 | Netagy   |
| True             | Quility | True                 | Quility  |
| True             |         | False                | null     |
+------------------+---------+----------------------+----------+
Implementation Duplication

The IS_NOT_EMPTY implementation duplicates the IS_EMPTY logic wrapped in a NOT. Consider whether a helper or reuse of the IS_EMPTY registration result is possible to reduce duplication and keep the two implementations in sync if IS_EMPTY logic ever changes.

register(
    IS_NOT_EMPTY,
    (FunctionImp1)
        (builder, arg) ->
            builder.makeCall(
                SqlStdOperatorTable.NOT,
                builder.makeCall(
                    SqlStdOperatorTable.OR,
                    builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
                    builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg))),
    PPLTypeChecker.family(SqlTypeFamily.ANY));

@github-actions

github-actions Bot commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 5eed3f2

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove function from incorrect grammar rule

ISNOTEMPTY is added to textFunctionName in the async-query parser, but isnotempty is
a boolean/condition function, not a text function. This is inconsistent with the
semantic meaning of the function and may cause it to be treated as a text function
in certain parsing contexts. It should only appear in the appropriate
condition-related rule.

async-query-core/src/main/antlr/OpenSearchPPLParser.g4 [868-869]

-| ISNOTEMPTY
-   | ISBLANK
+| ISBLANK
    ;
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about ISNOTEMPTY being placed in textFunctionName when it's a boolean/condition function. However, looking at the PR diff, ISEMPTY and ISBLANK are also in textFunctionName, so this pattern is consistent with existing code. The suggestion would require removing ISEMPTY and ISBLANK as well for consistency, making this a broader refactoring concern.

Low
General
Fix misleading empty display in documentation table

The last row in the example table shows isnotempty(temp) as True for an empty string
' ' (spaces), but isnotempty should return False for a whitespace-only string if
it follows the same logic as isempty (which returns true for null or empty string).
However, since isnotempty is defined as NOT of isempty (null OR empty string), a
whitespace string ' ' is not null and not empty, so isnotempty would return True.
The temp column value shown as empty in the table is misleading — it should display
the actual whitespace string ' ' to avoid confusion.

docs/user/ppl/functions/condition.md [674]

-| True             |         | False                | null     |
+| True             | '   '   | False                | null     |
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - displaying an empty cell instead of the whitespace string ' ' could be confusing to readers. However, this is a minor documentation clarity issue and the current behavior is technically correct since isnotempty returns True for whitespace strings.

Low

Previous suggestions

Suggestions up to commit 0ecc7f1
CategorySuggestion                                                                                                                                    Impact
General
Extract duplicated IS_EMPTY logic

The IS_NOT_EMPTY implementation duplicates the logic from IS_EMPTY. Consider
extracting the IS_EMPTY logic into a reusable method and negating it, which would
improve maintainability and reduce code duplication. This ensures both functions
stay synchronized if the empty check logic needs to change.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1256-1266]

 register(
     IS_NOT_EMPTY,
     (FunctionImp1)
         (builder, arg) ->
             builder.makeCall(
                 SqlStdOperatorTable.NOT,
-                builder.makeCall(
-                    SqlStdOperatorTable.OR,
-                    builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
-                    builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg))),
+                makeIsEmptyCall(builder, arg)),
     PPLTypeChecker.family(SqlTypeFamily.ANY));
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies code duplication between IS_NOT_EMPTY and IS_EMPTY implementations. However, the proposed makeIsEmptyCall method doesn't exist in the codebase, and the suggestion doesn't show how to implement it. While refactoring would improve maintainability, the current implementation is functionally correct and the improvement is moderate.

Low
Suggestions up to commit 88cfad4
CategorySuggestion                                                                                                                                    Impact
General
Extract common expression to variable

The isnotempty implementation duplicates the isempty logic inline. Consider
extracting the common OR expression into a variable or reusing the existing IS_EMPTY
registration to avoid code duplication and improve maintainability.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java [1256-1266]

 register(
     IS_NOT_EMPTY,
     (FunctionImp1)
-        (builder, arg) ->
-            builder.makeCall(
-                SqlStdOperatorTable.NOT,
-                builder.makeCall(
-                    SqlStdOperatorTable.OR,
-                    builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
-                    builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg))),
+        (builder, arg) -> {
+          RexNode isEmptyCall = builder.makeCall(
+              SqlStdOperatorTable.OR,
+              builder.makeCall(SqlStdOperatorTable.IS_NULL, arg),
+              builder.makeCall(SqlStdOperatorTable.IS_EMPTY, arg));
+          return builder.makeCall(SqlStdOperatorTable.NOT, isEmptyCall);
+        },
     PPLTypeChecker.family(SqlTypeFamily.ANY));
Suggestion importance[1-10]: 4

__

Why: While extracting the common OR expression into a variable improves readability slightly, the impact is minimal. The current implementation is already clear and the duplication is intentional for defining IS_NOT_EMPTY as the negation of IS_EMPTY. This is a minor style improvement rather than a functional or maintainability concern.

Low

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0ecc7f1

@Swiddis Swiddis added the enhancement New feature or request label Mar 2, 2026

@Swiddis Swiddis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall solid, nice work. Good call on just registering this as an inversion of the existing function.

One chore to fix before this can be merged (in general we're gonna need green CI).

Ideally this could use a unit test, but doctest & existing isempty tests mostly covers everything we care about.

BWC tests failing due to flaky network issues, will just rerun them if they still fail once everything else is green.


```ppl
source=accounts
| eval temp = ifnull(employer, ' ')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I didn't realize that isempty counted blank strings, I thought it only matched empty (zero-length) strings.

Weird imo, not this PR's fault.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought isempty only matched null and zero-length strings, not blank/whitespace. Isn't that what isblank is for? Or am I misunderstanding something?

Comment thread docs/user/ppl/functions/condition.md Outdated
| True | Netagy | True | Netagy |
| True | Quility | True | Quility |
| True | | False | null |
+------------------+---------+----------------------+----------|

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chore: Doctests are failing because of bad table formatting at the corner.

Will have to fix that before this can be formally approved & accepted. (Can run doctest locally with ./gradlew doctest).

image

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the table corner formatting, doctest passes locally now.

Adds `isnotempty(field)` function to PPL that returns true when a field
is not null and not an empty string — the logical negation of
`isempty(field)`.

Implementation: NOT(IS_NULL(arg) OR IS_EMPTY(arg))

- ANTLR grammar: added ISNOTEMPTY token and parser rules in all three
  grammar modules (ppl, language-grammar, async-query-core)
- BuiltinFunctionName: added IS_NOT_EMPTY enum constant
- PPLFuncImpTable: registered Calcite implementation
- Integration test: testIsNotEmpty in CalcitePPLConditionBuiltinFunctionIT
- Documentation: added ISNOTEMPTY section to condition functions docs

Resolves opensearch-project#5182

Signed-off-by: Luca Cavenaghi <lucacavenaghics97@gmail.com>
@lucacavenaghi97 lucacavenaghi97 force-pushed the add-isnotempty-function branch from 0ecc7f1 to 5eed3f2 Compare March 3, 2026 07:44
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5eed3f2

@penghuo

penghuo commented Mar 3, 2026

Copy link
Copy Markdown
Collaborator

@lucacavenaghi97
NOT isempty(field) already provides the expected "not empty" semantics. Based on this behavior, a separate isnotempty function is not required.

Sample dataset

PUT notisempty_demo
{
  "mappings": {
    "properties": {
      "id": {"type": "integer"},
      "name": {"type": "keyword"}
    }
  }
}

POST notisempty_demo/_bulk?refresh=true
{"index":{}}
{"id":1,"name":"alice"}
{"index":{}}
{"id":2,"name":""}
{"index":{}}
{"id":3,"name":null}
{"index":{}}
{"id":4,"name":"bob"}
{"index":{}}
{"id":5}

Query

source=notisempty_demo
| where NOT isempty(name)
| fields id, name
| sort id

Result:

  • Returns only id=1 (alice) and id=4 (bob)
  • Rows with name = '', name = null, or missing name are excluded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] isnotempty function

3 participants